Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search
نویسندگان
چکیده
The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of one permutation hashing (Li et al., 2012) in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL + dL), where d is the number of nonzeros of the data vector,K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.
منابع مشابه
Fast Near Neighbor Search in High-Dimensional Binary Data
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating has...
متن کاملImproved Densification of One Permutation Hashing
The existing work on densification of one permutation hashing [24] reduces the query processing cost of the (K,L)-parameterized Locality Sensitive Hashing (LSH) algorithm with minwise hashing, from O(dKL) to merely O(d + KL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. While that is a substantial improve...
متن کاملOne Permutation Hashing for Efficient Search and Learning
Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, the method of b-bit minwise hashing has been applied to large-scale linear learning (e.g., linear SVM or logistic regression) and sublinear time near-neighbor search. The major drawback of minwise hashing is the expensive preprocessing cost...
متن کاملAn algorithm for L1 nearest neighbor search via monotonic embedding
Fast algorithms for nearest neighbor (NN) search have in large part focused on 2 distance. Here we develop an approach for 1 distance that begins with an explicit and exactly distance-preserving embedding of the points into 2. We show how this can efficiently be combined with random-projection based methods for 2 NN search, such as locality-sensitive hashing (LSH) or random projection trees. We...
متن کاملAn algorithm for `1 nearest neighbor search via monotonic embedding
Fast algorithms for nearest neighbor (NN) search have in large part focused on `2 distance. Here we develop an approach for `1 distance that begins with an explicit and exactly distance-preserving embedding of the points into `2. We show how this can efficiently be combined with random-projection based methods for `2 NN search, such as locality-sensitive hashing (LSH) or random projection trees...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014